PRESEMT Phrase Model Generator

نویسنده

  • Michalis Troullinos
چکیده

The Phrasing model generator uses the output of the Phrase aligner module to train a phrasing model for the SL. The output of the Phrase aligner module contains the segmentation into phrases of the SL side of the bilingual corpus. This model is then applied for segmenting an SL text being input to the PRESEMT system for translation. The aforementioned procedure is illustrated in figure 1. The main method for extracting the phrasing model is statistical-based, since a substantial amount of research has already been invested in creating statistical language models in NLP tasks (e.g. [2]). The Phrasing model generator uses as input the PAM output and specifically the XML string representing the phrase-aligned SL side of the parallel corpus. The PMG output consists of the SL texts to be translated by PRESEMT, which are segmented into phrases compatible with the phrasing model used in the TL. The method for extracting the phrasing model is statistical and following comparative evaluations is based on the CRF (Conditional Random Fields) model [1]. The CRF model that is trained with the above process is being used by the PRESEMT machine translation system as a phrasal segmentation module for the SL. The purpose of CRF is to group together words that form a complete phrase as a pre-processing operation to the translation. In the translation process, the phrases created by the Phrasing model generator will be used to generate a segmentation of the input sentence and, 1 Figure 1: Overview of PMG based on that, to search for appropriate translations. One main requirement for the methodology used to develop this module is to be language-independent, in a way that the proposed model can adapt to any language provided that a training set of acceptable quality is available, in the form of a parallel corpus of sentences coupled with a parser that splits TL-side sentences into phrases. For implementation, the MALLET package was chosen as it is written in Java (which is the programming language favored in the PRESEMT project) and has more extensive support in comparison to other existing software implementations of CRF. Additionally, for parameter passing and object injection the Spring 1 framework was used. The Spring framework made executing experiments easier as for each set of experiments it only requires the creation of a set of xml configuration files (with no need to modify java code). 2 Finally the training method used …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing CRF and template-matching in phrasing tasks within a Hybrid MT system

The present article focuses on improving the performance of a hybrid Machine Translation (MT) system, namely PRESEMT. The PRESEMT methodology is readily portable to new language pairs, and allows the creation of MT systems with minimal reliance on expensive resources. PRESEMT is phrase-based and uses a small parallel corpus from which to extract structural transformations from the source langua...

متن کامل

Implementing a Language-Independent MT Methodology

The current paper presents a languageindependent methodology, which facilitates the creation of machine translation (MT) systems for various language pairs. This methodology is implemented in the PRESEMT hybrid MT system. PRESEMT has the lowest possible requirements on specialised resources and tools, given that for many languages (especially less widely used ones) only limited linguistic resou...

متن کامل

PRESEMT: Pattern Recognition-based Statistically Enhanced MT

This document contains a brief presentation of the PRESEMT project that aims in the development of a novel language-independent methodology for the creation of a flexible and adaptable MT system.

متن کامل

Evaluating the Translation Accuracy of a Novel Language-Independent MT Methodology

The current paper evaluates the performance of the PRESEMT methodology, which facilitates the creation of machine translation (MT) systems for different language pairs. This methodology aims to develop a hybrid MT system that extracts translation information from large, predominantly monolingual corpora, using pattern recognition techniques. PRESEMT has been designed to have the lowest possible...

متن کامل

Language-independent hybrid MT with PRESEMT

The present article provides a comprehensive review of the work carried out on developing PRESEMT, a hybrid language-independent machine translation (MT) methodology. This methodology has been designed to facilitate rapid creation of MT systems for unconstrained language pairs, setting the lowest possible requirements on specialised resources and tools. Given the limited availability of resourc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013